JCO Clinical Cancer Informatics
● American Society of Clinical Oncology (ASCO)
All preprints, ranked by how well they match JCO Clinical Cancer Informatics's content profile, based on 18 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Jun, H.; Tanaka, Y.; Johri, S.; Carvalho, F. L.; Jordan, A. C.; Labaki, C.; Nagy, M.; O'Meara, T. A.; Pappa, T.; Pimenta, E. M.; Saad, E.; Yang, D. D.; Gillani, R.; Tewari, A. K.; Reardon, B.; Van Allen, E. M.
Show abstract
The rapid expansion of molecularly informed therapies in oncology, coupled with evolving regulatory FDA approvals, poses a challenge for oncologists seeking to integrate precision cancer medicine into patient care. Large Language Models (LLMs) have demonstrated potential for clinical applications, but their reliance on general knowledge limits their ability to provide up-to-date and niche treatment recommendations. To address this challenge, we developed a RAG-LLM workflow augmented with Molecular Oncology Almanac (MOAlmanac), a curated precision oncology knowledge resource, and evaluated this approach relative to alternative frameworks (i.e. LLM-only) in making biomarker-driven treatment recommendations using both unstructured and structured data. We evaluated performance across 234 therapy-biomarker relationships. Finally, we assessed real-world applicability of the workflow by testing it on actual queries from practicing oncologists. While LLM-only achieved 62-75% accuracy in biomarker-driven treatment recommendations, RAG-LLM achieved 79-91% accuracy with an unstructured database and 94-95% accuracy with a structured database. In addition to accuracy, structured context augmentation significantly increased precision (49% to 80%) and F1-score (57% to 84%) compared to unstructured data augmentation. In queries provided by practicing oncologists, RAG-LLM achieved 81-90% accuracy. These findings demonstrate that the RAG-LLM framework effectively delivers precise and reliable FDA-approved precision oncology therapy recommendations grounded in individualized clinical data, and highlight the importance of integrating a well-curated, structured knowledge base in this process. While our RAG-LLM approach significantly improved accuracy compared to standard LLMs, further efforts will enhance the generation of reliable responses for ambiguous or unsupported clinical scenarios.
Lee, M. H.; Xiao, Y.; Li, X.; Klee, E.; Yang, P.; Sio, T.; Wang, L.; Cerhan, J. R.; Zong, N.
Show abstract
BackgroundElectronic health record (EHR)-based prognostic modeling is increasingly used in oncology, yet incorporating pharmacogenomic (PGx) knowledge derived from experimental systems into clinical prediction frameworks remains challenging. This gap is driven by fundamental mismatches between controlled drug-mutation assays and heterogeneous, incomplete real-world clinical data. MethodsWe propose a representation transfer framework that integrates PGx embeddings learned from large-scale in vitro pharmacogenomic screens into patient-level EHR models. A frozen pharmacogenomic encoder is used to generate interaction-aware embeddings from patient mutation profiles and administered therapies, which are aggregated into a fixed-length PGx Complementarity Representation. These representations are incorporated into multimodal survival prediction models alongside standard clinical features. Performance was evaluated using systematic modality ablation analyses, attribution analyses, and exploratory unsupervised representation analyses. ResultsIntegrating PGx embeddings yielded consistent performance improvements across all evaluated modality combinations. Relative gains were largest in modality-sparse settings, where baseline EHR features encode limited biological context, and were attenuated--but remained significant--in biologically enriched configurations. Attribution analyses indicated that PGx embeddings contributed non-redundant predictive signal beyond standard clinical features. Exploratory unsupervised analyses further demonstrated that the learned representations exhibit interpretable association patterns aligned with known therapeutic exposures and pathway-level associations. ConclusionThese findings suggest that externally learned pharmacogenomic representations can be transferred into real-world EHR models as a context-dependent, non-redundant augmentation. By framing PGx knowledge as an interaction-aware representation rather than a mechanistic model, this work provides an informatics framework for integrating experimental pharmacogenomic data into clinical prediction tasks in a reproducible and interpretable manner.
Petalcorin, M. I. R.
Show abstract
Background: Early-phase oncology development increasingly depends on integrated interpretation of clinical outcomes, translational biomarkers, and pharmacokinetic exposure rather than toxicity alone. This shift has created a need for reproducible analytical workflows that can combine heterogeneous trial data into traceable, analysis-ready outputs suitable for exploratory review and early decision support. Objective: To develop a reproducible Python-based workflow that simulates a plausible early-phase oncology study, integrates clinical, biomarker, and pharmacokinetic data, and generates analysis-ready datasets, visual summaries, and exploratory predictive models relevant to early development analytics. Methods: A workflow was constructed to simulate an early-phase oncology cohort of 120 patients distributed across multiple dose levels. Three synthetic raw data sources were generated, including patient-level clinical data, baseline biomarker data, and longitudinal pharmacokinetic profiles. These sources were merged into a single analysis-ready dataset containing derived variables such as tumor percent change from baseline, clinical-benefit status, exposure summaries, adverse-event indicators, and survival outcomes. The workflow produced structured tables, patient listings, waterfall plots, Kaplan-Meier-style survival curves, biomarker-response visualizations, pharmacokinetic profile plots, and exploratory machine-learning outputs. Results: The final integrated dataset contained 120 patients and 30 variables. Median survival across the simulated cohort was 243.8 days, and higher dose groups showed improved median survival and greater clinical benefit relative to the low-dose group. Clinical benefit increased from 8.6% in the low-dose group to 29.0% in the medium-dose group and 45.2% in the high-dose group. Higher baseline LDH, CRP, and ctDNA fraction tracked with less favorable tumor-response trajectories, whereas higher exposure, reflected by AUC and Cmax, associated with improved disease control. Pharmacokinetic profiles showed clear dose-dependent separation. Grade 3 or higher adverse-event rates remained within a plausible exploratory range across dose groups. A random-forest model for clinical benefit achieved an exploratory ROC AUC of 0.845, while a logistic-regression model for strict responder status could not be fit because no simulated patient met the prespecified objective response threshold. Conclusions: This proof-of-concept demonstrates that a transparent Python workflow can generate a coherent early-phase oncology analytical ecosystem from synthetic inputs. The workflow supports integration of heterogeneous data streams, derivation of analysis-ready variables, production of interpretable outputs, and exploratory modeling in a reproducible framework. Although the simulated responder prevalence was too low to support objective response modeling, this limitation itself highlights the importance of simulation calibration for downstream analytical validity. The framework provides a practical Health Informatics demonstration of how early oncology trial data can be structured and analyzed for exploratory translational decision support.
Dickerson, J. C.; McClure, M. B.; Shaw, M.; Reitsma, M. B.; Dalal, N. H.; Kurian, A. W.; Caswell-Jin, J. L.
Show abstract
Background: Manual chart abstraction is a major bottleneck in clinical research. In oncology, important outcomes such as disease recurrence and the treatment history are often only documented in clinical notes, limiting the scale and quality of observational and epidemiologic studies. We developed an open-source pipeline that, in a HIPAA-compliant setting, can use any commercially available large language model (LLM) to determine whether variables from complex longitudinal oncology records can be abstracted with performance similar to that of expert medical oncologists. Methods: We randomly selected 100 patients from an institutional breast cancer cohort enriched for complex care. We abstracted a range of key variables from unstructured data, including dates of diagnosis and recurrence, clinical stage, biomarker subtype, genetic testing results, and prescribed systemic therapies, including treatment timing, intent, and reason for discontinuation. The inputs to the LLM were unnormalized, unlabeled, and unedited clinical notes, pathology reports, med admin records, and demographics. Breast oncologists abstracted the same variables to create the reference standard. For systemic therapy extraction, a second oncologist and research coordinators served as comparators. In addition to variable-level performance, we examined whether survival and hazard-ratio estimates were similar for fully LLM-derived datasets compared with expert-derived datasets. Results: Among 100 patients, the median chart had more than 3,100 pages of text; patients received a median of 7 lines of therapy over 6.5 years of follow-up. The best-performing LLM achieved 99% concordance with the expert for recurrence status, 100% for germline BRCA1/2 pathogenic variant detection, 99% for hormone receptor status, 96% for HER2 status, 91% for clinical stage, 91% for PIK3CA mutation status, and 90% for ESR1 mutation status. For anti-cancer drug extraction, the best-performing LLM approached inter-oncologist variability. For exact therapy-line reconstruction, mean patient-level performance remained 9 percentage points lower than the second oncologist, although inter-LLM disagreement was similar to inter-oncologist disagreement. All four LLMs tested outperformed the research coordinators on systemic therapy abstraction. Recurrence-free survival, overall survival, and hazard ratio estimates were similar between expert-derived and LLM-derived datasets. In an external cohort of 97 young patients with early-stage breast cancer, the unmodified pipeline showed similar performance for recurrence detection and adjuvant endocrine therapy use. Conclusions: Off-the-shelf LLMs in a fixed retrieval pipeline were able to abstract a range of variables from complex longitudinal oncology records with performance approaching inter-oncologist variability for key tasks, without any fine-tuning or institution-specific retraining. This approach offers a practical path to scaling the creation of research-grade retrospective datasets from narrative medical records.
McInerney, S.; Gurku, H.; Balasubramanian, R.; Vikram, P.; Bhaskaran, S.; Sekaran, K.
Show abstract
ObjectivesTo evaluate the performance of SROTAS IQ, a custom fine-tuned large language model (LLM), in automating clinical trial eligibility screening for breast cancer patients using synthetic data. MethodsTen breast cancer trials were selected across diverse treatment settings and molecular subtypes. Fifteen synthetic patient summaries per trial were generated, including realistic and enriched eligibility scenarios. Two independent oncologists assessed trial eligibility for each patient, establishing ground truth. SROTAS IQ LLM was evaluated against expert consensus using standard classification metrics. Time-to-verdict was measured to compare clinician effort with automated assessment. ResultsSROTAS IQ demonstrated strong concordance with expert assessments, achieving 90% or greater accuracy in 5 of 10 trials. Across 150 patient-trial evaluations, the model correctly classified 88% of overall eligibility decisions. Performance was highest in trials with moderate complexity and fewer nested criteria, while more intricate protocols showed reduced accuracy. The LLM consistently delivered rapid assessments (<0.5 minutes per patient), with explainable outputs that aligned with clinical reasoning. These findings underscore the models potential to support high-fidelity, scalable trial matching in oncology. ConclusionSROTAS IQ offers a promising approach to automating clinical trial matching in oncology. Further real-world validation is needed to confirm generalisability and integration into clinical practice.
Soltanifar, M.; Portuguese, A. J.; Jeon, Y.; Gauthier, J.; Lee, C. H.
Show abstract
Oncology research and clinical practice in North America increasingly rely on complex endpoints, heterogeneous study designs, and high-dimensional molecular data. In this landscape, data visualization serves as a critical analytic instrument for study design communication, model diagnostics, safety reporting, and real-time clinical decision support. Despite its importance, the oncology visualization ecosystem remains fragmented across commercial platforms and bespoke scripts, lacking a unified, code-first reference that emphasizes reproducibility and auditability in the R programming environment. This paper addresses this gap by presenting a North American collaborative atlas of 62 oncology visualization templates: 24 for clinical trials, 12 for real-world evidence (RWE), and 26 common to both settings. A core innovation of this atlas is its simulation-driven approach; each plot is illustrated using transparent, reproducible data-generating mechanisms. This allows users to deterministically recreate figures and easily adapt templates to alternative endpoints, censoring patterns, and subgroup structures. The paper provides foundational notation for oncology endpoints, an operational taxonomy based on data geometry, and a consolidated review of relevant R software. We further synthesize the practical utility of these methods through four representative case studies and provide a comparative analysis of the strengths, limitations, and future challenges of oncology data visualization. A detailed tutorial on fishplot is included to demonstrate a publication-ready workflow for clonal evolution.
Shady, M.; Reardon, B.; Jiang, S.; Pimenta, E.; O'Meara, T.; Park, J.; kehl, K. L.; Elmarakeby, H. A.; Sunyaev, S. R.; Van Allen, E. M.
Show abstract
IntroductionPrecision oncology has informed cancer care by enabling the discovery and application of diagnostic, prognostic, and/or predictive molecular biomarkers. However, many patients lack actionable biomarkers or fail to respond to biomarker-directed therapies. Patient similarity approaches can leverage comprehensive tumor profiling and prior clinical experiences from large cohorts for decision support, facilitating broader realization of precision oncology insights. MethodsWe developed a deep learning-based modeling framework using real-world clinicogenomic data from a tertiary cancer center to (i) measure patient similarity based on embedded tumor genomic profiles and (ii) evaluate the association of derived patient subgroups and neighborhoods with shared therapeutic outcomes in breast cancer-specific and histology-agnostic pan-cancer settings. ResultsThe model recovered clinically meaningful patient clusters reflecting both expected and previously unknown therapeutic associations, as well as patient-specific neighborhoods that could inform therapeutic trajectories more often than expected by chance in multiple clinical contexts. Moreover, model utility extended to patients without actionable genomic biomarkers and those with cancer of unknown primary (CUP) diagnoses, where neighborhoods aligned with independently predicted primary cancer type. These neighborhoods could also be examined over time in a continuously learning scenario. ConclusionThis similarity-based modeling framework distilled complex molecular and clinical data into concise, context-specific insights that augment clinician judgment, providing a foundation for a real-time learning, patient-centered decision support model in precision oncology.
Windisch, P.; Weyrich, J.; Dennstaedt, F.; Zwahlen, D. R.; Foerster, R.; Schroeder, C.
Show abstract
PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusion of patients with localized and/or metastatic disease. Three flagship models (GPT-5.2, Gemini 3 Flash, Claude Opus 4.5) were queried with default settings in two independent conditions: label-only and label plus a verbatim supporting quote. Models could abstain if they deemed the abstract to not contain sufficient information. Each condition was repeated three times per abstract. Quotes were mechanically validated as exact substrings after whitespace normalization, and a separate judge step used an LLM to rate whether each quote supported the assigned label. ResultsEvidence requirements modestly reduced coverage (GPT-5.2 86.2% to 84.3%, Gemini 98.3% to 92.8%, Claude 96.0% to 94.5%) by increasing abstentions and, for Gemini, invalid outputs. Conditional macro-F1 remained high but changed by model (slight gains for GPT-5.2 and Gemini, decrease for Claude). Labels were stable across repetitions (Fleiss kappa 0.829 to 0.969). Mechanically valid quotes occurred in 83.3% to 91.2% of runs, yet only 48.0% to 78.8% of evidence-bearing predictions were judged semantically supported. Restricting to supported predictions increased macro-F1 at the cost of lower coverage. ConclusionSubstring-verifiable quotes provide an automated audit trail and enable selective, higher-trust automation when applying LLMs to biomedical text processing. However, this approach introduces new failure modes and trades coverage for verifiability in a model-dependent way.
Salome, P.; Knoll, M.; Walz, D.; Cogno, N.; Dedeoglu, A. S.; Qi, A. L.; Isakoff, S. J.; Abdollahi, A.; Jimenez, R. B.; Bitterman, D. S.; Paganetti, H.; Chamseddine, I.
Show abstract
IntroductionManual data extraction from unstructured clinical notes is labor-intensive and impractical for large-scale clinical and research operations. Existing automated approaches typically require large language models, dedicated computational infrastructure, and/or task-specific fine-tuning that depends on curated data. The objective of this study is to enable accurate extraction with smaller locally deployed models using a disease-site specific pipeline and prompt configuration that are optimized and reusable. Materials/MethodsWe developed OncoRAG, a four-phase pipeline that (1) generates feature-specific search terms via ontology enrichment, (2) constructs a clinical knowledge graph from notes using biomedical named entity recognition, (3) retrieves relevant context using graph-diffusion reranking, and (4) extracts features via structured prompts. We ran OncoRAG using Microsoft Phi-3-medium-instruct (14B parameters), a mid-size language model deployed locally via Ollama. The pipeline was applied to three cohorts: triple-negative breast cancer (TNBC; npatients=104, nfeatures=42; primary development), recurrent high-grade glioma (RiCi; npatients=191, nfeatures=19; cross-lingual validation in German), and MIMIC-IV (npatients=100, nfeatures=10; external testing). Downstream task utility was assessed by comparing survival models for 3-year progression-free survival built from automatically extracted versus manually curated features. ResultsThe pipeline achieved mean F1 scores of 0.80 {+/-} 0.07 (TNBC; npatients=44, nfeatures=42), 0.79 {+/-} 0.12 (RiCi; npatients=61, nfeatures=19), and 0.84 {+/-} 0.06 (MIMIC-IV; npatients=100, nfeatures=10) on test sets under the automatic configuration. Compared to direct LLM prompting and naive RAG baselines, OncoRAG improved the mean F1-score by 0.19 to 0.22 and 0.17 to 0.19, respectively. Manual configuration refinement further improved the F1-score to 0.83 (TNBC) and 0.81 (RiCi), with no change in MIMIC-IV. Extraction time averaged 1.7-1.9 seconds per feature with the 14B model. Substituting a smaller 3.8B model reduced extraction time by 57%, with a decrease in F1-score (0.03-0.10). For TNBC, the extraction time was reduced from approximately two weeks of manual abstraction to under 2.5 hours. In an exploratory survival analysis, models using automatically extracted features showed a comparable C-index to those with manual curation (0.77 vs 0.76; 12 events). ConclusionsOncoRAG, deployed locally using a mid-size language model, achieved accurate feature extraction from multilingual oncology notes without fine-tuning. It was validated against manual extraction for both retrieval accuracy and survival model development. This locally deployable approach, which requires no external data sharing, addresses a critical bottleneck in scalable oncology research. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=89 SRC="FIGDIR/small/26347717v1_ufig1.gif" ALT="Figure 1000"> View larger version (23K): org.highwire.dtl.DTLVardef@178a4e8org.highwire.dtl.DTLVardef@1928b7corg.highwire.dtl.DTLVardef@38f36org.highwire.dtl.DTLVardef@1af4d51_HPS_FORMAT_FIGEXP M_FIG C_FIG
Xu, S.; Wang, Z.; Wang, H.; Ding, Z.; Zou, Y.; Cao, Y.
Show abstract
Online cancer peer-support communities generate large volumes of patient-authored and caregiver-authored text that may reflect distress, coping, and informational needs. Automated emotional tone classification could support scalable monitoring, but supervised modeling depends on label quality and may benefit from explicit context features. Using the Mental Health Insights: Vulnerable Cancer Survivors & Caregivers dataset, we compared five model families (TF-IDF Logistic Regression, Random Forest, LightGBM, GRU, and fine-tuned ALBERT) on a three-class target (Negative/Neutral/Positive) derived from four original categories. We introduced two extensions: (i) LLM-based annotation to generate parallel "AI labels" and (ii) token-based augmentation that prepends LLM-extracted structured variables (reporter role and cancer type) to the post text. Models were trained with a 60/20/20 stratified train/validation/test split, with hyperparameters selected on validation data only. Test performance was summarized using weighted F1 and macro one-vs-rest AUC with bootstrap confidence intervals, with paired comparisons based on McNemar tests and false discovery rate adjustment. The LLM annotator produced substantial redistribution in the four-class label space, shifting prevalence toward very negative relative to the original labels; the shift persisted but attenuated after collapsing to three classes. Across all model families, token augmen-tation improved held-out performance, with the largest gains for GRU and consistent improvements for ALBERT. Augmentation also reduced polarity-reversing errors (Nega-{leftrightarrow} tive Positive) for ALBERT, while adjacent errors (Negative {leftrightarrow} Neutral) remained the dominant residual failure mode. These results indicate that LLM-based supervision can introduce systematic measurement shifts that require auditing, yet LLM-extracted context incorporated via simple token augmentation provides a pragmatic, model-agnostic mechanism to improve downstream emotional tone classification for supportive oncology decision support. Author summaryWe studied how to better monitor emotional tone in posts from online cancer peer-support communities, where patients and caregivers share experiences that may signal distress, coping, or unmet needs. Automated classification could help organizations and moderators identify when additional support may be needed, but these systems depend on the quality of the labels used for training and may miss clinical context. Using a public dataset of cancer survivor and caregiver posts, we trained and compared several machine-learning and deep-learning models to classify each post as negative, neutral, or positive. We tested two practical improvements. First, we used a large language model to generate an additional set of "AI labels" and examined how these differed from the original categories. Second, we extracted simple context information--whether the writer was a patient or caregiver and what cancer type was mentioned--and added this context to the text before model training. We found that adding context consistently improved performance across model types. However, the AI-generated labels shifted class distributions, indicating that automated labeling can introduce systematic changes that should be audited. Overall, simple context extraction can make emotional tone monitoring more accurate and useful for supportive oncology decision support.
Rossi, L. A.; Roberts, L. M.; Zachariah, F. J.
Show abstract
Prognostication in oncology is increasingly difficult due to the rapid evolution of therapies with significant improvement of survival. Accurate prognostication is essential to provide optimal, value-driven end of life care for cancer patients, and can promote goals of care (GOC) conversations with the potential to minimize chemotherapy or ICU utilization in the last weeks of life, and possibly increase hospice admission and length of stay.1 There are several recent publications on the application of machine learning for prognostication.2,3 We developed a 90-day mortality prediction model trained with data in the Electronic Health Records (EHR). After a non-interventional pilot stage, we deployed the model in February 2021 in the real-time Electronic Health Record Epic infrastructure of our cancer center. Here we present the model and evaluate its overall performance for the first 7.5 months since the go-live and outline our evaluation process for the next stages.
Petalcorin, M. I. R.
Show abstract
Background: Modern oncology development depends on integrating radiographic response, molecular biomarkers, treatment exposure, safety, and survival endpoints, yet access to well-structured patient-level trial data is often limited. Methods: We developed a synthetic, literature-informed phase II randomized oncology trial framework that followed the sequence Patient [->] Data [->] Dataset [->] Analysis [->] Tables/Figures [->] Decision. A cohort of randomized patients was simulated with baseline demographic and disease features, longitudinal tumor measurements, circulating tumor DNA, inflammatory and exploratory biomarkers, adverse events, treatment exposure, and survival outcomes. Raw source datasets were transformed into SDTM-like domains and ADaM-like analysis datasets, then analyzed for baseline characteristics, exposure, best overall response, survival, subgroup hazard ratios, longitudinal tumor and biomarker changes, exposure-response, and safety. Results: The treatment arm showed a coherent efficacy signal across multiple analytical layers. Treatment increased objective response and clinical benefit, reduced tumor burden over time, and prolonged survival. Median overall survival increased from 135 days in the control arm to 288 days in the treatment arm, with an approximate hazard ratio of 0.661 (95% CI, 0.480-0.911; p = 0.011). Median progression-free survival increased from 116 to 208 days, with an approximate hazard ratio of 0.601 (95% CI, 0.418-0.864; p = 0.006). Circulating tumor DNA showed a more favorable trajectory in treated patients and aligned directionally with radiographic and survival benefit. Safety analyses showed increased treatment-related toxicity, but the overall safety profile remained interpretable and compatible with continued development. Conclusions: This study demonstrates that a synthetic, literature-informed oncology trial can reproduce a biologically plausible and analytically coherent efficacy-safety signal architecture across radiographic, molecular, and time-to-event endpoints, providing a decision-oriented prototype for translational oncology clinical data science. Keywords: synthetic clinical trial, oncology, ctDNA, Kaplan-Meier, biomarker, survival analysis, translational data science, ADaM, SDTM
Stillwell, R. C.
Show abstract
BackgroundThe integration of multi-modal genomic data for cancer classification remains challenging in precision oncology. While machine learning approaches have shown promise, there is a gap between research prototypes and systems with the comprehensive infrastructure required for clinical deployment. MethodsI developed Cancer Alpha, an AI system that integrates data from TCGA, GEO, ENCODE, and ICGC ARGO databases for multi-modal cancer classification. The system combines state-of-the-art multi-modal transformer architectures with production infrastructure including containerized deployment, monitoring systems, and security frameworks. I implemented a Multi-Modal Transformer (MMT) architecture incorporating cross-modal attention mechanisms, TabTransformer for structured genomic data, and Perceiver IO for high-dimensional omics integration. ResultsIn synthetic benchmark tests, Cancer Alpha achieved high performance with ensemble models reaching 99% accuracy on optimized datasets. The system includes production infrastructure with Docker containerization, Kubernetes orchestration, CI/CD pipelines, and monitoring capabilities using Prometheus and Grafana. The platform provides a web interface and RESTful API for potential clinical integration. ConclusionsCancer Alpha demonstrates the feasibility of developing production-ready infrastructure for multi-modal cancer classification. The platforms comprehensive architecture may facilitate future clinical validation and deployment in precision oncology applications, pending validation with real-world clinical data.
Gallifant, J.; Afshar, M.; Ameen, S.; Aphinyanaphongs, Y.; Chen, S.; Cacciamani, G.; Demner-Fushman, D.; Dligach, D.; Daneshjou, R.; Fernandes, C.; Hansen, L. H.; Landman, A.; McCoy, L. G.; Miller, T.; Moreno, A.; Munch, N.; Restrepo, D.; Savova, G.; Umeton, R.; Gichoya, J. W.; Collins, G. S.; Moons, K. G. M.; Celi, L. A.; Bitterman, D. S.
Show abstract
Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website (https://tripod-llm.vercel.app/) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting. COIDSB: Editorial, unrelated to this work: Associate Editor of Radiation Oncology, HemOnc.org (no financial compensation); Research funding, unrelated to this work: American Association for Cancer Research; Advisory and consulting, unrelated to this work: MercurialAI. DDF: Editorial, unrelated to this work: Associate Editor of JAMIA, Editorial Board of Scientific Data, Nature; Funding, unrelated to this work: the intramural research program at the U.S. National Library of Medicine, National Institutes of Health. JWG: Editorial, unrelated to this work: Editorial Board of Radiology: Artificial Intelligence, British Journal of Radiology AI journal and NEJM AI. All other authors declare no conflicts of interest.
Figaschewski, M.; Sürün, B.; Tiede, T.; Kohlbacher, O.
Show abstract
BackgroundPersonalized oncology represents a shift in cancer treatment from conventional methods to target specific therapies where the decisions are made based on the patient specific tumor profile. Selection of the optimal therapy relies on a complex interdisciplinary analysis and interpretation of these variants by experts in molecular tumor boards. With up to hundreds of somatic variants identified in a tumor, this process requires visual analytics tools to guide and accelerate the annotation process. ResultsThe Personal Cancer Network Explorer (PeCaX) is a visual analytics tool supporting the efficient annotation, navigation, and interpretation of somatic genomic variants through functional annotation, drug target annotation, and visual interpretation within the context of biological networks. Starting with somatic variants in a VCF file, PeCaX enables users to explore these variants through a web-based graphical user interface. The most protruding feature of PeCaX is the combination of clinical variant annotation and gene-drug networks with an interactive visualization. This reduces the time and effort the user needs to invest to get to a treatment suggestion and helps to generate new hypotheses. PeCaX is being provided as a platform-independent containerized software package for local or institution-wide deployment. PeCaX is available for download at https://github.com/KohlbacherLab/PeCaX-docker.
Makani, A.
Show abstract
Medical oncology education faces a dual crisis: knowledge velocity that outpaces static curricula and large language model (LLM) risks--hallucination and automation bias--that threaten the fidelity of AI-assisted learning. We present Onco-Shikshak V7, an AI-native adaptive learning platform that addresses both challenges through a unified cognitive architecture grounded in learning science. The system replaces isolated educational modules with four authentic clinical workflows--Morning Report, Tumor Board, Clinic Day, and AI Textbook--each scaffolded by a nine-module pedagogy engine that integrates ACT-R activation dynamics (illness scripts), Item Response Theory (adaptive difficulty), the Free Spaced Repetition Scheduler (FSRS v4), Zone of Proximal Development (scaffolding), and metacognitive calibration training (Brier score). Six specialist AI agents--medical oncology, radiation oncology, surgical oncology, pathology, radiology, and oncology navigation--engage in multi-disciplinary deliberation with per-specialty retrieval-augmented generation (RAG) grounding across nine authoritative guideline sources including NCCN, ESMO, and ASTRO. The platform provides 18 clinical cases with decision trees across six cancer types, maps every interaction to 13 ACGME Hematology-Oncology milestones, and implements four closed-loop feedback mechanisms that connect session errors to targeted flashcards, weak domains to suggested cases, and all interactions to a persistent learner profile. Technical validation confirms algorithmic correctness across eight subsystems. To our knowledge, this is the first system to unify ACT-R, IRT, FSRS, ZPD, and metacognitive calibration in a single medical education platform. Formal learner evaluation via randomized controlled trial is planned.
Vesteghem, C.; Dahl, S. C.; Broendum, R. F.; Soenderkaer, M.; Boedker, J. S.; Schmitz, A.; Weischenfeldt, J.; Pedersen, I. S.; Sommer, M.; Rytter, A. S.; Nielsen, M. M.; Ladekarl, M.; Severinsen, M. T.; Dybkaer, K.; Groenbaek, K.; El-Galaly, T.; Roug, A. S.; Boegsted, M.
Show abstract
ObjectivesTo facilitate clinical implementation and research in precision oncology, notably the pairing of patients, variants and treatments to identify candidates for clinical trials, we have built a data infrastructure to 1) capture and store data, 2) reduce manual tasks for clinical and genomic data collection and management, 3) combine data for quality controls, reporting and findability. InfrastructureThe infrastructure uses REDCap repositories to capture and store data. The structure of these repositories is customized for each project. Additionally, a cross-project web platform was developed using software development best practices and state-of-the-art web technologies to circumvent REDCaps limitations and integrate other third-party resources. Using REDCaps application programming interfaces, this platform allowed validation of data across multiple repositories, easy import of data from external sources, generation of overviews of included patients and available data, combination of genomic and clinical data to generate tumour board reports and the findability of data. Its design was driven by data stewardship best practices. UsageAcross four precision medicine projects, the infrastructure has been used to collect data for 1921 patients, including 453 genomic data files. The custom-built web platform made it possible to import, validate, and present data in a comprehensive manner. This included building tumour board reports for clinicians, combining clinical and genomic data, and search functionalities for researchers. DiscussionREDCap allowed us to capitalize on the numerous data capture and management features developed in this solution. Designing a cross-project platform guarantees long-term relevance where developments can be mutualised across projects and allowed us to make the overall solution more compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Further developments should be considered, notably automatic retrieval of data from electronic health records to limit the number of manual tasks. ConclusionThe proposed infrastructure allowed our precision oncology projects to gain efficiency in data collection and increase data quality by reducing manual work, and it gave a straightforward and customized access to data for researchers and clinicians.
Kapilivsky, J.; Islam, F.; Roth, E. K.; Dow, J.; Moran, S.; Scherrer, E.; Hyun, S. W.; Sangli, C.
Show abstract
PurposeReal-world data (RWD) from electronic health records (EHRs) and next-generation sequencing are increasingly used to study treatment effectiveness in molecularly refined patient populations. Incomplete mortality data in EHR can overestimate survival rates in RWD studies. While the National Death Index (NDI) is the gold standard for mortality data in the United States, its limited accessibility and reporting delays hinder timely research. Instead, EHR datasets are often supplemented with external mortality data sources to improve mortality data capture. This study evaluated a composite mortality variable against NDI records using a large cohort of advanced cancer patients from a real-world oncology database. MethodsDe-identified clinical and molecular data from patients with advanced solid tumors were linked with third-party mortality and claims datasets using deterministic tokenization. Vital status and death dates were harmonized across sources. Patient identifiers were submitted to NDI, and true matches were de-identified and joined for analysis. Performance metrics (sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV]) were calculated using NDI as ground truth. Date agreement was assessed at 0, {+/-}15, and {+/-}30-day tolerances. Subgroup analyses and a cumulative cases/dynamic controls (CC/DC) approach were also performed. ResultsAmong 17,597 patients, the composite mortality variable demonstrated 82% sensitivity and 95% specificity against NDI. PPV was 96%, and NPV was 77%. Exact date agreement was 86%, increasing to 94% within a {+/-}15-day tolerance and 96% within a {+/-}30-day tolerance. Incorporating third-party mortality and claims data substantially improved sensitivity from 17% (EHR alone) to 82%. Sensitivity remained stable across subgroups but showed variation by age, cancer type, geographic region, and race. With the CC/DC approach, sensitivity was 96% at 6 months, 97% at 12 months, and 98% at 24 months, with specificity above 98% across these timeframes. ConclusionsThe composite mortality variable is a robust, reliable endpoint for real-world evidence analyses. Its high accuracy for identified deaths and appropriate censoring of lost-to-follow-up patients support its use in overall survival analyses. This validation is a foundational step towards high-quality research to improve patient outcomes and advance cancer drug development using this multimodal dataset. Clinical trial number: not applicable
Dennstaedt, F.; Bobnar, T.; Handra, A.; Putora, P. M.; Filchenko, I.; Brueningk, S.; Aebersold, D. M.; Cihoric, N.; Shelan, M.
Show abstract
BackgroundThe growing volume of biomedical literature, especially in oncology, necessitates automated tools for extracting clinically relevant information. Large Language Models (LLMs) offer promising capabilities for data extraction in this domain. However, their potential to extract clinically relevant information from case reports detailing rare treatment interactions, remains underexplored. MethodsWe systematically searched PubMed for case reports on interactions between radiotherapy (RT) and Pembrolizumab, Cetuximab, or Cisplatin. A random sample of 100 report abstracts for each therapy was manually classified by two independent medical experts using 17 Boolean questions about patient demographics, treatment, cancer type and outcome with mutually exclusive answers, forming a ground truth. An LLM-based system with the open-source GPT models (GPT-OSS-120B and GPT-OSS-20B) was applied to classify these reports and the remaining dataset entries using the defined question structure. Performance of the LLM-based information extraction was evaluated using the standard classification metrics accuracy, precision, recall, and F1-scores. ResultsThe systematic searches yielded 320 (Pembrolizumab), 147 (Cetuximab), and 2055 (Cisplatin) publications. Inter-rater agreement for manual classification was high (Cohens kappa = 0.87), though lower (0.60-0.80) for specific outcome and cancer type questions. The LLM-based classification (GPT-OSS-120B model) achieved high overall performance with an F1-score of 94.33% (95.83% accuracy, 93.69% precision, 94.98% recall). Performance was consistent across systemic therapies, with the smaller GPT-OSS-20B model showing similar results (F1-score 94.06%). Analysis of the entire datasets revealed that 56.02% of publications described patients who received both RT and systemic therapy. Proportions of positive and negative outcomes varied by therapy and sequencing. ConclusionsLLM-based classification systems demonstrate high accuracy and reliability for curating scientific case reports on RT and systemic therapy interactions. These findings support their potential for high-throughput hypothesis generation and knowledge base construction in oncology, particularly for underutilized case reports, with even smaller open-source models proving effective for such tasks.
Hughes, N.; Hogenboom, J.; Carter, R.; Norman, L.; Gouthamchand, V.; Lindner, O.; Connearn, E.; Lobo Gomes, A.; Sikora-Koperska, A.; Rosinska, M.; Pogoda, K.; Wiechno, P.; Jagodzinska-Mucha, P.; Lugowska, I.; Hanebaum, S.; Dekker, A.; van der Graaf, W.; Husson, O.; Wee, L.; Feltbower, R.; Stark, D.
Show abstract
Background: Population-based cancer registers (PBCR) are important for monitoring trends in cancer epidemiology, facilitating the implementation of effective cancer services. Adolescents and Young Adult (AYA) with cancer are a patient group with a unique set of needs. The utility of PBCR in AYA is limited by the lack of AYA-specific data items. STRONG AYA, an international multidisciplinary consortium is addressing this through federated learning (FL) methodology and novel data visualisation concepts. A Core Outcome Set (COS) has been developed to measure outcomes of importance through clinical data and Patient Reported Outcomes (PROs). We describe how data from the Yorkshire Specialist Register of Cancer in Children and Young People (YSRCCYP), a PBCR in the UK is being used within STRONG AYA and how the subsequent analyses can guide patient consultations. Methods: Data from the YSRCCYP were imported into a Vantage 6 node, from which FL analyses are performed along with data provided by other consortium members. The results are extracted into the PROMPT software and integrated into patient electronic healthcare records. Results: Healthcare professionals can view the results of individual PROs at various time points and in comparison, to summary analyses carried out within the STRONG AYA infrastructure. Results can be filtered by age, disease, country and stage. Conclusion: We have demonstrated how a regional PBCR can contribute to a pan-European infrastructure and analyses viewed to enhance patient consultations. Such analyses have the potential to be used for research and policy-making, improving outcomes for AYA.